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1. Introduction 

The empirical measure P„ and empirical process G„ of a sample of observations 
Xi, . . . , Xn from a probability measure P on a measurable space {X , A) attach 
to a given measurable function /; A" — > M the numbers 

It is often useful to study the suprema of these stochastic processes over a given 
class J- of measurable functions. The distribution of the supremum 

||G„||jr: = sup |G„/| 

is known to concentrate near its mean value, at a rate depending on the size of 
the envelope function of the class T, but irrespective of its complexity. On the 
other hand, the mean value of ||G„|| jr depends on the size of the class J". Entropy 
integrals, of which there are two basic versions, are useful tools to bound this 
mean value. 
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The uniform entropy integral was introduced in [9] and [5], following [3], in 
their study of the abstract version of Donsker's theorem. We define an L^-version 
of it as 

J((5, J-, Lr) = sup / Jl + logN{e\\F\\Q,r,J',Lr{Q)) de. 
Q Jo ^ 

Here the supremum is taken over all finitely discrete probability distributions 
Q on (X.A), the covering number N(^e,T, Lr{Q)) is the minimal number of 
balls of radius s in Lr{Q) needed to cover F is an envelope function of J^, 
and ll/llg.r denotes the norm of a function / in Lr{Q)- The integral is defined 
relative to an envelope function, which need not be the minimal one, but can 
be any measurable function F R such that |/| < F for every f G li 
multiple envelope functions are under consideration, then we write J{S, T\ F, Lr) 
to stress this dependence. An inequality, due to Pollard (also see [12], 2.14.1), 
says, under some measurability assumptions, that 

Ep||G„||^< J(l,.F,L2)||i^||p,2. (1.1) 

Here < means smaller than up to a universal constant. This shows that for a class 
T with finite uniform entropy integral, the supremum ||G„||j:- is not essentially 
bigger than a multiple of the empirical process at the envelope function 

F. The inequality is particularly useful if this envelope function is small. 

The bracketing entropy integral has its roots in the Donsker theorem of [8], 
again following initial work by Dudley. For a given norm it can be defined as 

M^,^, II • II) = 1^ ^Jl + logN^^{e\\F\\,J^, \\ ■ ||) de. 

Here the bracketing number A^[](e, J^, || • ||) is the minimal number of brackets 
[I, u] = {/: X ^ R:l < f < u} oi size \\u — l\\ smaller than e needed to cover J^. 
A useful inequality, due to Pollard (also see [12], 2.14.2), is 

EM|G„||^ < J[](1,.F,L2(P)) \\F\\p^2. (1.2) 

Bracketing numbers arc bigger than covering numbers (at twice the size), and 
hence the bracketing integral is bigger than a multiple of the corresponding 
entropy integral. However, the bracketing integral involves only the single dis- 
tribution P, whereas the uniform entropy integral takes a supremum over all 
(discrete) distributions, making the two integrals incomparable in general. Apart 
from this difference the two maximal inequalities have the same message. 

The two inequalities (1.1) and (1.2) involve the size of the envelope function, 
but not the sizes of the individual functions in the class J^. They also exploit 
finiteness of the entropy integrals only, roughly requiring that the entropy grows 
at smaller order than as e ^ 0, and not the precise size of the entropy. In 
the case of the bracketing integral this is remedied in the equality (see [12], 
3.4.2), valid for any class of functions f:X ^ [-1,1] with P/^ < S^PF"^ and 
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any J e (0,1), 

EM|G„||^< J„(^,^,i.,(P))||F||p,, + (1.3) 

Here the assumption that the class of functions is uniformly bounded is too 

restrictive for some applications, but can be removed if the entropy integral is 
computed relative to the stronger "norm" 

||/||p,B=(2P(el/l-l-|/|))'^'. 

Although it is not a norm, this quantity can be used to define the size of brackets 
and hence bracketing numbers. Inequality (1.3) is valid for an arbitrary class 
of functions with ||/||p.b < (5||-F||p.b if the _L2(P)-norm is replaced by || • ||p,b 
in its right side (at four appearances) (see Theorem 3.4.3 of [12]). The "norm" 
II • ||p,B derives from the refined version of Bernstein's inequality, which was first 
used in the literature on rates of convergence of minimum contrast estimators 
in [1] (also see [11]). 

Maximal inequalities of type (1.3) using uniform entropy are thus far un- 
available. In this note we derive an exact parallel of (1.3) for uniformly bounded 
functions, and investigate similar inequalities for unbounded functions. The va- 
lidity of these results seems unexpected, as the stronger control given by brack- 
eting has often been thought necessary for estimates of moduli of continuity. It 
was suggested to us by Theorem 3.1 and its proof in [4]. 

1.1. Application to minimum contrast estimators 

Inequalities involving the sizes of the functions / arc of particular interest in 
the investigation of empirical minimum contrast estimators. Suppose that On 
mimimizes a criterion of the type 

e H^- Wnme, 

for given measurable functions me: X ^M. indexed by a parameter 6, and that 
the population contrast satisfies, for a "true" parameter and some metric d 
on the parameter set, 

Pme-Pme„>d^{9,0o). 

A bound on the rate of convergence of 9n to Oq can then be derived from the 
modulus of continuity of the empirical process Gn^ne indexed by the functions 
me- Specifically (see e.g. [12], 3.2.5) if ^„ is a function such that 5 ^ (j)n{5)/5" 
is decreasing for some a < 2 and 

E sup \G,n{m0-me„)\<4>n{6), (1.4) 
e:d{e,eo)<5 

then d{9n,0o) = Op(5„), for 6n any solution to 

MSn) < V^Sl (1.5) 
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Inequality (1.4) involves the empirical process indexed by the class of functions 
Ais = {mg—nigg: d{9, do) < S}. lid dominates the _L2(-P)-norm, or another norm 
II • II that can be used in an equaUty of the type (1.3), such as the Bernstein 
norm, and the norms of the envelopes of the classes Ms are bounded in 6, then 
we can choose 

J{S,Ms,\\-\ 



K{S) = Ji5,Ms,\\-\\) 1 



where J is an appropriate entropy integral. For this choice the inequaUty (1.5) 
is equivalent to 

J{Sn,Ms„,\\-\\) <V^€- (1-6) 
Thus a rate of convergence can be read off directly from the entropy integral. 

We note that an inequality of type (1.3) is unattractive for very small 6, as 
the bound may even increase to infinity as ^ J, 0. However, it is accurate for the 
range of S that are important in the application to moduli of eontimiity. 

Moduli of continuity also play an important role in model selection theorems. 
See for instance [7]. 

Inequalities involving uniform entropy permit for instance the immediate 
derivation of rates of convergence for minimum contrast functions that form 
VC-classes. Furthermore, uniform entropy is preserved under various (combina- 
torial) operations to make new classes of functions. This makes uniform entropy 
integrals a useful tool in situations where bracketing numbers may be difficult to 
handle. Equation (1.6) gives an elegant characterization of rates of convergence 
in these situations, where thus far ad-hoc arguments were necessary. 



2. Uniformly Bounded Classes 

Call the class T of functions P -measurable if the map 

n 

{Xi,...,Xn) ^ sup Vei/(Xi) 

on the completion of the probability space (A"", P") is measurable, for every 

sequence ei, 62, • . • , e„ G { — 1, !}• 

Theorem 2.1. Let T he a P -measurable class of measurable functions with 
envelope function F < 1 and such that is P -measurable. If Pf^ < 6^PF^, 
for every f and some 6 G (0, 1), then 

EM|G„|l.<j(^,^,L2)(l + ^^|^)||f||p,.. 

Proof. We use the following refinement of (1.1) (see e.g. [12], 2.14.1): for any 
P-measurable class J^, 

^p\\Gn\\^<^pj{^^^^^^0^,:F,L,) (P„F^)V^. (2.1) 
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Because 5 J {6, L2) is the integral of a nonincreasing nonnegative function, 
it is a concave function such that the map t h- > J{t)/t, which is the average of its 
derivative over [0,t\, is nonincreasing. The concavity shows that its perspective 
{x,t) 1-^ tJ{x/t,J^,L2) is a concave function of its two arguments (cf. [2], page 
89). Furthermore, the "extended- value extension" of this function (which by 
definition is —00 if x < or t < 0) is obviously nondecreasing in its first 
argument and was noted to be nondecreasing in its second argument. Therefore, 
by the vector composition rules for concave functions ([2], pages 83-87, especially 
lines -2 and -1 of page 86), the function {x, y) ^ H{x, y): = Ji^\/x/y, T, L2) 
is concave. We have that EpP„F^ = ||-F'||p2- Therefore, by an application of 
Jensen's inequality to the right side of the preceding display we obtain, for 
al = supjP„/2, 



Ep||G„||^ < j-,£,)||F||p,2. (2.2) 

The application of Jensen's inequality with outer expectations can be justified 
here by the monotonicity of the function i?, which shows that the measurable 

majorant of a variable H{U. V) is bounded above by H{U*, V*), for U* and V* 
measurable majorants of U and V. Thus E*H{U, V) < EH{U*, V*), after which 
Jensen's inequality can be applied in its usual (measurable) form. 

The second step of the proof is to bound EptrJ;. Because P„/^ = Pf^ + 
n~^/^G„/^ and Pp < d^PF^ for every /, we have 

E*pC7l < 5^F\\%^2 + -^E*p\\GJ^2. (2.3) 

Here the empirical process in the second term can be replaced by the sym- 
metrized empirical process G° (defined as G°/ = n~^^'^Y^"^-^^eif{Xi) for inde- 
pendent Rademacher variables £1, £2, • • • , £«) at the cost of adding a multiplica- 
tive factor 2 (e.g. [12], 2.3.1). The expectation can be factorized as the expecta- 
tion on the Rademacher variables e followed by the expectation on Xi, . . . . X„ , 
and Ee||G° ||jr2 < 2Ee||G°||^ by the contraction principle for Rademacher vari- 
ables ([6], Theorem 4.12), and the fact that F < 1 by assumption. Taking the 
expectation on Xi, . . . ,X„, we obtain that Ep||G„||jr2 < 4Ep||G°||jr, which in 
turn is bounded above by 8Ep||G„||^ by the desymmetrization inequality (e.g. 
2.36 in [12]). 

Thus in the last term of (2.3) can be replaced by at the cost of 
inserting a constant. Next we apply (2.2) to this term, and conclude that 
z'^: = EpC7^/||i^||p2 satisfies the inequality 

We apply Lemma 2.1 with r = 1, A = 6 and B"^ = l/{^^\\F\\p^2) to see that 

JHS,F,L2) 



J{Z,J^,L2)<J{S,J^,L2) + 



S^V^\\F\\p,2 
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We insert this in (2.2) to complete the proof. ■ 

Lemma 2.1. Let J: (0,00) R be a concave, nondecreasing function with 
J(0) = 0. // z'^ <A'^ + B'^J{z'') for some r € (0, 2) and A,B>0, then 



B\2 



J{z)<J{A)[l + J{An[-) 



l/(2-r) 



Proof. For t > s > wc can write s as the convex combination s = {s/t)t+ (1 — 
s/t)0 of t and 0. Since J(0) = 0, the concavity of J gives that J(s) > {s/t)J{t). 
Thus the function 1 1-)- J{t)/t is decreasing, which imphes that J{Ct) < CJ{t) 
for C > 1 and any t > 0. 

By the monotonicity of J and the assumption on z it follows that 

J{z^) < j{{A^ + B^J{z^)r") < J(A'-)(l+ (I) ',/(,-/■)) 

This implies that J{z^) is bounded by a multiple of the maximum of J{A^ ) and 
J{A'-){B/AYJ{z'-Yl'^. If it is bounded by the second one, then J(0'-)i-'-/2 < 
jIa'YB/AY. We conclude that 

> / R \ 2r/(2-r) 

J{z-)<J{A^) + J{A'-fli^-)(^^ 
Next again by the monotonicity of J, 

J(z) < j(yA^+B^J{z-)^ < J{A)^ll + 



A 



2r/(2-r)^ 



1/2 



< J(^) l + _ + _ 



5\2/(2-r) 



rU/(2-r) 



The middle term on the right side is bounded by a multiple of the sum of the 
first and third terms, since a; < 1^" + a;' for any conjugate pair (p, q) and any 
a; > 0, in particular x = ^/ J{A^)B/A. ■ 

For values of S such that (51|F||p,2 <C 1/\A^ Theorem 2.1 can be improved. 
(This seems not to be of prime interest for statistical applications.) Its bound can 
be written in the form J{6,T, L2)\\F\\p^2 + J^{S,^,L2)/{S^\/n). In the second 
term S can be replaced by 1/(||F||p^2aAi)i which is better if S is smaller than 
the latter number, as the function S i— > J{5^T, L2)/5 is decreasing. 

Lemma 2.2. Under the conditions of Theorem 2.1, 

E^IIGnlb < J L2) \\F\\p,2 + ( ^ll^ll^^^ ^ L2) V^\\F\\%^2- 
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Proof. We follow the proof of Theorem 2.1 up to (2.4), but next use the alter- 
native bounds 

< J{6, .F, L2) + JiSn, J", L2) y^^^^£^^^, 

for l/6n = v^I1^I1p,2- Here we have used the subadditivity of the map S h^- 
J{5, T, L2), and the inequahty J(C5, T , I2) < CJ{6, T ^ L2) for C > 1 in the last 
step. We can bound the sum of the three terms on the right side by a multiple 
of the maximum of these terms and conclude that the left side is smaller than 
at least one of the three terms. Solving next yields that 

J{z, T, L2) < J [6, T, L2) V J^^^n, ^^L2) ^ j^^^^j^^ 

On 

Because J((5„, T, L2) > 6n for every 6n > 0, by the definition of the entropy in- 
tegral, the third term on the right is bounded by the second term. We substitute 
the bound in (2.2) to finish the proof. ■ 

3. Unbounded Classes 

In this section we investigate relaxations of the assumption that the class of 
functions is uniformly bounded, made in Theorem 2.1. We start with a moment 
bound on the envelope. 

Theorem 3.1. Let J- be a P -measurable class of measurable functions with 
envelope function F such that PF^^p~^^/^p~^^ < 00 for some p > 1 and such 
that T"^ and J"^ are P -measurable. If Pf^ < 5'^PF^ for every f and some 
6 e (0, 1), then 

< H^.^. L.) (1 + 

Proof. Application of (2.1) to the functions forming the class with enve- 
lope function F^, yields 

Wp\\Gr.\W^ <Wpj[-^^^,T^\F\L2) {^nF'f'\ (3.1) 

for an,r the diameter of in Lj-iV^), i.e. 

= supP„|/r. (3.2) 
/ 
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Preservation properties of uniform entropy (sec [10]. or [12]. 2.10.20, where 
the suprcmum over Q can also be moved outside the integral to match our 
current definition of entropy integral, applied to </>(/) = with L = 2F) 
show that J{5,F'^\ F^Lz) < J{5,F\ F,L2), for every S > 0. Because F„f = 
Pp + n-i/2G„/2 and P f < 6'^PF'^ by assumption, we find that 

Ep^l. < S-PF^ + ;^E^j(^^J^,.F,L,) (P„F^)V2. (3.3) 

The next step is to bound C7„^4 in terms of (7n,2- 

By Holder's inequality, for any conjugate pair {p, q) and any < s < 4, 

Pn/^ < Pn\f\^-'F' < (P„|/|(4--)P)l/''(P„i^-9)l/''. 

Choosing s such that (4 — s)p = 2 (and hence sq = [Ap — 2)/{p — 1)), we find 
that 

We insert this bound in (3.3). The function {x,y) x^/'Py^/'^ is concave, and 
hence the function {x,y,z) i-^ Jiy-\/ x^l^^y^li j z^T , L^)^ can be seen to be con- 
cave by the same arguments as in the proof of Theorem 2.1. Therefore, we can 
apply Jensen's inequality to see that 

EpCT„,2^^-f^^ V (PF4)T72 ,J-,L2j{Pt )' . 

We conclude that z: = (Epcr^_2)^^^/ll-P'l|p,2 satisfies 



.2 < x2 , ^ / i/p i£i;:r_l£i_2__ T T 
' +7^-^^ fPF^W^ '-^'^^ 



<5^ + J{z^^P,J',L2)- 



V^(PF2)l-l/(2p) • 

In the last step we use that J{CS, T ^ L2) < CJ{S, J^, L2) for C > 1, and Holder's 
inequality as previously to see that the present C satisfies this condition. We 
next apply Lemma 2.1 (with r = 1/p) to obtain a bound on J{z,J^,L2), and 
conclude the proof by substituting this bound in (2.2). ■ 

The preceding theorem assumes only a finite moment of the envelope func- 
tion, but in comparison to Theorem 2.1 substitutes J{5^^p,T,L2) in the cor- 
rection term of the upper bound, where p > 1 and hence 6^^^ 3> S for small 
6. In applications to moduli of continuity of minimum contrast criteria this is 
sufficient to obtain consistency with a rate, but typically the rate will be subop- 
timal. The rate improves as p J, 1, which requires finite moments of the envelope 
function of order increasing to infinity, the limiting case p = 1 corresponding 
to a bounded envelope, as in Theorem 2.1. The following theorem interpolates 
between finite moments of any order and a bounded envelope function. If ap- 
plied to obtaining rates of convergence it gives rates that are optimal up to a 
logarithmic factor. 
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Theorem 3.2. Let F he a P-measurable class of measurable functions with 

envelope function F such that P exp(FP'^f) < oo for some p, p > and such 
that J-"^ and T'^ are P-measurable. If Pf^ < 6^PF^ for every f and some 
6 G (0, 1/2), then for a constant c depending onp, PF"^, PF'^ and P ex.p{FP~^'') , 

Proof. Fix r — 2/p. The functions ip,ip: [0,oo) — > [0,oo) defined by 
^(/)=log'-(l + /), ?(/) = e/'"' - 1, 

are each other's inverses, and are increasing from ip{0) = "0(0) = to infinity. 
Thus their primitive functions '!'(/) = ipis) ds and ~ ip{s) ds satisfy 
Young's inequality fg < ^'(/) + '^{g), for every f,g > (e.g. [2], page 120, 
3.38). 

The function t tlog^{l/t) is concave in a neighbourhood of (specifi- 
cally: on the interval (0,6"^"'' A 1)), with limit from the right equal to at 0, 
and derivative tending to infinity at this point. Therefore, there exists a con- 
cave, increasing function k: (0, cjo) — > (0,oo) that is identical to t tlog^{l/t) 
near and bounded below and above by a positive constant times the identity 
throughout its domain. (E.g. extend t h- > tlog^{l/t) linearly with slope 1 from 
the point where the derivative of the latter function has decreased to 1.) Write 
k{t) = tP^{t), so that is bounded below by a constant and i{t) = log(l/t) 
near 0. Then, for every t > 0, 

^5e±p<log,2 + .)^ P^4, 

(The constant in < may depend on r.) To sec this, note that for C > c the left 
side is bounded by a multiple of log(2 + t/c), whereas for small C the left side 
is bounded by a multiple of [log(2 + t) + log(l + 1/C)] /1{C) < log(2 + t) + l. 
From the inequality ^(/) < f'4'if), we obtain that, for / > 0, 



*(log'^(2 + /)) 



Therefore, by (3.4) followed by Young's inequality, 

/4 /2l0g'-(2 + /VC2) 



k{C^) l0g"(2 + /2/C2) Ir^C^^ 

<^ + *(F2log'-(2 + F2)). 

On integrating this with respect to the empirical measure, with = Pn/^, we 
see that, with G = ^(F^ iogr(2 + p2^-^ ^ 

Pn/'<fc(Pn/')(l+P„G). 
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We take the supremum over / to bound cr* 4 as in (3.2) in terms of A:(cr^ 2); and 
next substitute this bound in (3.3) to find that 



where wc have used the concavity of fc, and the concavity of the other maps, 
as previously. By assumption the expected value PG is finite for r = 2 /p. It 
follows that = Ep(T^ satisfies, for suitable constants a,b,c depending 

on r, PF^, PF^ and PG, 



v ^ 

By concavity and the fact that fc(0) = 0, we have k{Cz) < Ck{z), for C > 1 and 
z > 0. The function z yjk{z'^h)c inherits this propcirty. Therefore we can apply 
Lemma 3.1, with k of the lemma equal to the present function z 1— > k{z^b)c, to 
obtain a bound on J(z, J^, L2) in terms of J((5, J^, L2) and j(^^k{S'^b)c, T , L2), 
which we substitute in (2.2). Here k{S'^) = 6^log^{l/5) for sufficiently small 
S > and k{6'^) ^ S"^ < S'^\og^{l/S) for S < 1/2 and bounded away from 0. 
Thus we can simplify the bound to the one in the statement of the theorem, 
possibly after increasing the constants a,b,c to be at least 1, to complete the 
proof. ■ 

Lemma 3.1. Let J:(0,oo) — > M &e a concave, nondecreasing function with 
J(0) — 0, and let A;:(0, 00) — > (0, 00) be nondecreasing and satisfy k{Cz) < 
Ck{z) forC>l and z > 0. If z^ < + B'^J{k{z)) for some A,B>0, then 

B\2 



J{z)<J{A)[l + j{k{A))(-) 



Proof. As noted in the proof of Lemma 2.1 the properties of J imply that 
J(Cz) < CJ{z) for C > 1 and any 2; > 0. In view of the assimied property of k 
and the monotonicity of J it follows that Jok{Cz) < CJok{z) for every C > 1 
and ^ > 0. Therefore, by the monotonicity of J and fc, and the assumption on 

Jok{z) <Jok(yA^ + B'^Jok{z)) < J o k{A)^Jl + {B/Ayj o k{z). 
As in the proof of Lemma 2.1 we can solve this for J o k{z) to find that 
J o k{z) <Jo k{A) + J o k{Af (^)^ 
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Next again by the monotonicity of J, 

J{z) < j(^y/A^ + B^Jok{z)^ < J{A)y/l + {B/Ayjok{z) 

< J{A) [l + (f ) ^/J^HA) + (I) ' J o kiA) . 

The middle term on the right side is bounded by the sum of the first and third 
terms. ■ 
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